Ristretto is an innovative vision-language model that employs dynamic image token deployment technology, allowing flexible adjustment of image token quantities based on task requirements, surpassing previous generations in performance and versatility.
Image-to-Text
Transformers Supports Multiple Languages